Efficient incremental learning through probabilistic training set selection

ABSTRACT

Disclosed are systems and methods to incrementally train neural networks. Incrementally training the neural networks can include defining a probability distribution of labeled training examples from a training sample pool, generating a first training set based off the probability distribution, training the neural network with the first training set, adding at least one additional training sample to the training sample pool, generating a second training set, and training the neural network with the second training set. The incremental training can be recursive for additional training sets until a decision to end the recursion is made.

BACKGROUND

Machine learning is a potent tool to generate models and neural networks that are able to show relations and simulate expected behaviors of the object the model is based on. However, in situations where distributional shift is common, the neural network must continuously update to account for new data distributions, which is computationally expensive.

One manner of continuously updating or training a neural network is by incremental learning, a supervised machine learning method where a neural network repeatedly “views” input/output pairs from a training set. When training a neural network in the incremental setting, there are two obvious/naïve options: training on new examples as they become available or storing new examples as they become available to train on all available examples at each step.

Training a neural network on each example upon encountering results in efficient learning. However, by subjecting the neural network only upon encountering the new training set may result in a phenomenon known as “catastrophic forgetting,” where the neural network begins to forget earlier tasks unless reminded by some additional means, such as keeping the previous examples in the training set.

Storing each example to be relearned at each step can remedy catastrophic forgetting, but due to sheer volume, can prove to be computationally inefficient and expensive. For neural networks that feature dynamically changing environments and complex inputs and outputs, such as with vehicles, there is a need for a more efficient learning algorithm.

SUMMARY

Generally, a system for training a neural network through probabilistic training set selection is described. In administering incremental learning in a neural network training set selection can be computationally expensive depending on the size of the training set to be administered. Catastrophic forgetting for neural networks should be avoided for an efficiently running system. A system for training a neural network can build training sets based in part on weighted probabilities, where the probability is based on the neural network's exposure to specific examples. The training system can add and adjust weights to the training examples. The probability of the training set having a particular training example can be based on an inverse relationship with how often the neural network has seen the particular training example.

In some embodiments, a computer-implemented method for incrementally training a neural network is described. The method can include defining a probability distribution over a labeled sample pool, and generating a first training set by selecting a first set of labeled samples from the labeled sample pool, with the selection based in part on the probability distribution. A neural network can then be trained on the first training set. Additional labeled samples can be added to the labeled sample pool, from which a second training set can be generated, and used to further train the neural network.

In some embodiments a system for training a neural network is described. The system can include a neural network, a labeled sample pool, a processor, and a system memory. The system memory may contain instructions for the above-described computer-implemented method for incrementally training the neural network.

Implementations of the computer-implemented method may include one or more of the following features. In embodiments, the steps of adding one or more labeled samples to the pool, generating an additional training set, and training the neural network may be continuously repeated until a determination to cease is made. In embodiments, the probability distribution may be defined such that the probability of selecting any given labeled sample is inversely proportional to a number of times the sample has been previously selected. In embodiments, the probability may be a weighted random distribution. The probability may be controlled by a hyperparameter to control the sharpness of the distribution. The sharpness of the distribution can be defined to avoid a uniform distribution and avoid a one-hot distribution. In embodiments the probability distribution can be modified by a fraction of the labeled sample pool having a common dataset label. In embodiments, the training set size may be variable.

BRIEF DESCRIPTION OF THE DRAWINGS

A further understanding of the nature and advantages of various embodiments may be realized by reference to the following figures. In the appended figures, similar components or features may have the same reference label. Further, various components of the same type may be distinguished by following the reference label by a dash and a second label that distinguishes among the similar components. If only the first reference label is used in the specification, the description is applicable to any one of the similar components having the same first reference label irrespective of the second reference label.

FIG. 1 illustrates an intelligence system for training a neural network, according to embodiments.

FIG. 2 is a probability distribution for selecting labeled samples, according to embodiments.

FIG. 3 illustrates a method of training a neural network, according to embodiments.

FIG. 4 is a flowchart for incremental training of a neural network, according to embodiments.

FIG. 5 illustrates a block diagram of a vehicle system, according to some embodiments.

FIG. 6 illustrates a block diagram of a computing system, according to some embodiments.

FIG. 7 illustrates a cloud computing system, according to some embodiments.

DETAILED DESCRIPTION

Training neural networks through incremental training methods can prove computationally expensive and inefficient, due to the phenomenon of catastrophic forgetting when a neural network or model has not seen an input/output pair over a longer period of time. Older trained examples tend to be forgotten as the neural network learns and sees more and more examples. In order to account for this, a neural network can be trained such that on each training iteration, it sees every example pair. But doing so is inefficient and computationally heavy.

Labeling and decoupling the training set from the full pool of training examples can allow for more efficient training of a neural network by reducing the training set size such that the neural network does not need to go through every training example each time the neural network goes through an iteration of training. The training sets can be generated using probability distributions across a labeled pool of training examples. In this manner, catastrophic forgetting can be addressed by assigning each training example with metadata that weights the training examples based on how often the neural network has seen the training example. An inverse relationship between the recency of the neural network seeing the training example and the probability of the training example being in the training set can minimize the risk of a neural network experiencing catastrophic forgetting. Additional data can be used to further modify the probability to efficiently create additional training sets.

FIG. 1 is an exemplary intelligence system 100 for training a neural network. The intelligence system 100 includes a training system 105. The training system 105 has a neural network 110, a training sample pool 126, a labeled sample pool 128, a training module 130, and a training sample selection module 132. While FIG. 1 shows the modules and components described herein, the modules and components may be in differing configurations while still performing the same functionality, and still within the scope of this disclosure. For example, the training system may be implemented on one or more computing systems, such as, for example, computing device 600 as described with respect to FIG. 6, computing system 502, as described with respect to FIG. 5, or server 730 as described with respect to FIG. 7.

The neural network 110 may be a neural network trained by incremental, iterative learning, through unique selection of a training set. In embodiments, the neural network 110 can be a neural network for any purpose, such as document classification, data validation, research, forecasting, operating a vehicle through artificial intelligence or any other suitable, applicable vehicle-compatible machine learning model.

The training sample pool 126 may store a training set of examples, made up of different inputs and outputs through which the neural network 110 learns. For a neural network using gradient descent, it may go through iterative cycles of training in order to provide a learned model of input/output responses. The training set stored in the training sample pool 126 can have a training set size which may be variable or set to differing sizes as determined through the training sample selection module 132.

In embodiments, the neural network may be trained iteratively, with a new training set in each recursion. The training sample pool 126 is updated to be the new training set before the neural network 110 is trained on the next training set. The training sample pool 126 may be updated until a determination is made to cease repeating the training.

The labeled sample pool 128 can be in communication with the training sample pool 126. The labeled sample pool 128 is a pool of examples for labeled training samples for training the neural network. In embodiments, the training examples can be assigned with a probability based on the exposure a particular neural network has had with the training example. In doing so, a neural network can be trained on training sets that minimize the risk of catastrophic forgetting by ensuring when a particular training example has not been seen by the neural network 110 for a longer period of time, there is an increasingly higher probability that the particular training example gets put into the training set.

Each example in the labeled sample pool 128 can be assigned a probability based off of metadata for each individual training example. For example, the various training examples can be labeled with a classification, such as traffic signs, weather conditions, vehicle types, that affect the probability of the training example being included.

The training module 130 can take a training set for the neural network 110 and train the neural network 110. In embodiments, the neural network 110 goes through incremental training via gradient descent. In other embodiments, other training tasks such as linear regression, logistic regression, classification or any other suitable training task for a neural network trained through supervised learning can be used in the training module 130.

The training sample selection module 132 can be in communication with the labeled sample pool 128 and the training module 130 to build a training set for the neural network 110. The selection of specific examples of the training set can be through a probability distribution based on the number of times the particular example has been passed through the neural network for training. Thus, for examples that have been seen more recently, a lower distribution value is assigned, and for examples that have been seen less recently, a higher distribution value is assigned. In embodiments, the probability distribution, for example, may be defined as the formulas discussed further below in FIG. 2.

Moreover, by having the selection of training samples pulled with a weighted probability, the number of available examples is decoupled from the size of the training set, giving greater control over the amount of training to be done for a neural network. The additional control may allow for more computationally efficient training while avoiding problems such as catastrophic forgetting that may accompany other modes of training that look to cut down on computational power and redundancy. In embodiments the size of the training set put together by the training sample selection module 132 can be based on the total size of the labeled sample pool 128, a constant number, or any suitable training set size for the neural network 110. For example, in embodiments where the size of the training set depends on the size of the labeled sample pool 128, the training set size will scale upwards as more and more examples are added to the labeled sample pool 128.

The probability distribution of the training sample selection module 132 can also upsample based off of additional labels on the individual training examples. For example, the probability distribution can take into account specific classes of the training examples to modify the probability the training example is pulled into the training set.

In operation, a training system 105 may be run iteratively on the same neural network 110 over a period of time to both train and test the neural network 110. The neural network 110 may learn under a supervised learning model, such as by a classification or regression model. An example or a number of examples may be added to the labeled sample pool 128, where different training examples are labeled with a probability based on the number of times an example has been passed to the model for training. A training set is then built by the training sample selection module 132 based on the probability of each example in the labeled sample pool 128 and stored in the training sample pool 126. The built training set, using a distribution of the various probabilities of the samples can be used to train the neural network 110 through the training module 130. After training the neural network 110, the training system 110 can be run iteratively on the same neural network 110 in order to further train and test the neural network 110.

In embodiments, the intelligence system 100 may train the neural network 110 iteratively, with multiple training sets. Additional training sets can be generated by adding one or more labeled examples to the labeled sample pool 128 and creating a new training set from the labeled sample pool 128 to be stored in the training sample pool 126. The intelligence system 100 may continuously repeat the generation of new training sets and training the neural network 110 with the new training sets until a determination is made to cease the iterations. The determination to cease may be made by an operator, may be programmed in the processor, as a part of the model algorithm, or any other suitable ceasing determination.

FIG. 2 are exemplary formulas that represent probability distributions for a system, such as training system 105, for building a training set from a pool of labeled examples, such as the labeled sample pool 128.

Formula 202 is an exemplary formula to define the probability of a labeled example, e, or in other words to define the probability of a piece of data within the labeled pool sample 128. In some embodiments, the formula 202 can be used by a training sample selection module, such as training sample selection module 132. In formula 202, the probability or frequency of an example is dependent on T(e), which represents the number of times e has been used for training the neural network. Thus, in formula 202, the probability of e is inversely proportional to the number of times the neural network has seen the particular example.

In formula 202, c is a hyperparameter that controls the sharpness of the distribution. For example, setting c=0 yields a uniform distribution over all examples, and setting c=∞ yields a one-hot distribution. The hyperparameter c controls the extent to which the training of the neural network is affected by previously received examples versus newly received examples. In some embodiments, setting c=0 can result in catastrophic forgetting. In embodiments, the value of the parameter c may rest between 0 and ∞ in order to avoid uniform sampling and also avoid a one-hot distribution. With a uniform distribution, the training set picks randomly from all examples with no preference based on how many times the neural network has seen it. With a one-hot distribution, exclusively one sample that the neural network has seen the least is selected to train on. As a general application, both of these scenarios may be suboptimal. Thus, in some embodiments, setting c to approximately at or between 0.5 and 5, at or between approximately 0.5 to 1.5, or at approximately 1 can yield improved results.

In some embodiments, Z is a normalization term. As such, Z ensures the distribution sums to 1. Z may be calculated as the summation of the score of each example in the pool, where the score is defined by the formula (1/(1+T(e))^(c).

In embodiments, a training sample selection module may use formula 204 to determine the probability of a particular example, e. Formula 204 allows for upsampling to deal with imbalances in classification problems. In formula 204, the frequency term is modified by an additional term reflecting upsampling as determined by F(e), representing the fraction of the training samples with the same label as e. In other words, this additional term increases the likelihood of examples being selected that belong to classes having a small total number of examples.

The second term is modified by the hyperparameter c₂ which determines the sharpness of the upsampling term. In modifying the term, when c₂=0, there is no upsampling, and when c₂=∞, only examples from the smallest class are sampled. Due to this additional term in formula 204 that modifies the frequency term, a probability distribution is modified by formula 204 according to both the upsampling fraction as well as the number of times the neural network has encountered the example when selecting examples to train on.

While the above formulas are disclosed for determining the probability an example is selected, other probability distributions not listed may be within the spirit of this disclosure in assigning a weighted probability to example sets in machine learning. The formulas may be applied to machine learning algorithms beyond classification, such as other supervised learning settings, like logical regression, object detection, question answering, sequence-to-sequence, and other supervised machine learning tasks.

FIG. 3 is a method 300 of training a neural network through an incremental probabilistic training set selection. Method 300 can be performed by, for example, the training system 100 with the training system 10 and the neural network 110, as described above with respect to FIG. 1.

At step 302, a probability is defined over a labeled sample pool, the labeled sample pool comprising a plurality of labeled samples. For example, the labeled sample pool may be the labeled sample pool 128. The defined probability may be in accordance with formula 202, formula 204, or any suitable probability distribution. The plurality of labeled samples can come from a labeled sample pool, such as labeled sample pool 128.

At step 304, a first training set with a training set size is generated by selecting a first set of labeled samples from the labeled sample pool based on the probability distribution. The first training set can, for example, be selected by the training sample selection module 132 and stored in a training sample pool, such as training sample pool 128. The training set size can be adjusted. For example, in some embodiments, the training set size can be based on a size of the labeled sample pool. In other embodiments, the training set size can be set to a constant.

At step 306, a neural network is trained with the first training set. The training can occur, for example, through a training module, such as training module 130. In some embodiments, this training can be performed using any desired training algorithm including, for example, gradient descent, Newton method, Conjugate gradient, Quasi-Newton method, Levenberg-Marquardt algorithm, or any other desired training algorithm.

At step 308, one or more additional labeled samples are added to the labeled sample pool. The one or more additional labeled samples can come from a labeled sample pool, such as the labeled sample pool 128.

At step 310, a second training set is generated by selecting a second set of labeled samples from the labeled sample pool based on the probability distribution, with the second training set having a second training set size. Step 310 may, in some embodiments, be a recursion of step 304. The new training set can be stored in a training sample pool, such as training sample pool 128.

At step 312, the neural network is trained with the second training set. The training can occur, for example, through a training module, such as training module 130. In some embodiments, this training can be performed using any desired training algorithm including, for example, gradient descent, Newton method, Conjugate gradient, Quasi-Newton method, Levenberg-Marquardt algorithm, or any other desired training algorithm.

With reference now to FIG. 4, a flowchart illustrating another embodiment of a process 400 training a neural network through an incremental probabilistic training set selection is shown. The neural network training can be trained using an intelligence system, such as intelligence system 100 described in FIG. 1.

To start the process 400, in step 401, a labeled pool and probability distribution over the labeled pool is initialized. The labeled pool may be the labeled sample pool 128. The probability distribution may be based on the probability of each particular example, as determined by a formula, such as formula 202 and formula 204 of FIG. 2.

In step 402, new data is received. In some embodiments, the new data is received by the training system 100. The new data may be input by an operator, may be received from sensors, may be received from other models, such as vehicle simulations, or any suitable source of training data.

In step 404, the data received in step 402 is appended to the labeled pool initialized in step 401. In some embodiments, this can include the addition of the data received in step 402 to the labeled sample pool 128. In the event that the labeled sample pool 128 already contains some data, the new data can be added to the sample labeled pool 128 such that the labeled sample pool 128 subsequently contains both the already contained data and the newly added data.

In step 406, attributes of data stored in the labeled pool are identified. These identified attributes of the data stored in the labeled pool can include, for example, classifications, the number of times a particular example has been used to train a particular neural network, or otherwise. In some embodiments, these attributes can be identified based on example metadata. In some embodiments, for example, each example contained in the labeled sample pool 128 can have metadata which can identify one or several traits of the example. In some embodiments, identifying the attributes of a example stored in the labeled sample pool 128 can include retrieving the metadata of the sample, and extracting information identifying one or several attributes of the example from the example's metadata.

In step 408, the probability distribution of data stored in the labeled pool is calculated. The calculation may be based on formulas that account for the number of times a particular example has been used to train a particular neural network, such as, for example, by at least one of: formula 202 and formula 204, described above in FIG. 2.

In step 410, the training set size is determined. The training set size can be set by an operator or be determined as a function of the labeled pool size. In some embodiments, the training set sized can be determined by the training sample selection module 132 of the training system 100.

In step 412, the training set is selected, according to the probability distribution calculated in step 408. The training set can be selected by the training sample selection module 132 of the training system 100. In some embodiments, for example, the training sample selection module 132 can utilize a sampling algorithm, such as a random number generator based algorithm, to select the training set. The training set can be stored in a training sample pool, such as training sample pool 126.

In step 414, the model or neural network is trained by the training set. The training can occur through a training module, such as training module 130. In some embodiments, this training can be performed using any desired training algorithm including, for example, gradient descent, Newton method, Conjugate gradient, Quasi-Newton method, Levenberg-Marquardt algorithm, or any other desired training algorithm.

In step 416, the attributes of the data stored in the labeled pool is updated, after training. For example, the number of occurrences the neural network has been trained on the particular example may be incremented to reflect the most recent training. In some embodiments, these attributes may be updated in the labeled sample pool 128 by, for example, the training module 130 and/or by the training sample selection module 132.

In step 418, the trained model is outputted. The trained model may be output on to a server, on to a dongle, on to a hard drive, or any other suitable model-storage unit.

In step 420, a determination is made whether the training is complete. If not, the flowchart proceeds to step 424. In some embodiments, this determination can be made by the training system 100, and specifically by the training module 130. If so, the flowchart proceeds to step 422, which terminates the training. The determination of whether the training is complete can be made by an operator, an automated process, or any suitable determination method. In some embodiments, it can be determined if training is complete by comparing some attribute of the trained model, trained model performance, model training, or the like to a termination criteria. In some embodiments, for example, the termination criteria can identify a number of iterations that when reached triggers termination of training. In another embodiment, the training criteria can identify a model performance parameter such as, for example, an error rate. In such an embodiment, when the error rate of the trained model drops to or below an error rate identified by the termination criteria, then training of the model can be terminated.

If the training is not complete, then in step 424, a determination is made if there's additional data. If there is not additional data, the flowchart may wait until further additional data is received in step 426 before proceeding back to step 404. If additional data is already present, the flowchart may proceed directly to step 404.

Any suitable computing system or group of computing systems can be used for performing the operations or methods described herein. For example, FIG. 5 illustrates a vehicle system including a computing system 502 as well as multiple ECUs which may perform some or all of the functions described herein. FIG. 6 further depicts an example of a computing device 600 that may be at least a portion of computing system 502.

FIG. 5 illustrates a block diagram of a vehicle system 500, according to some embodiments. The vehicle system 500 may include a computing system 502 configured to communicate over an in-vehicle network 514. The computing system 502 includes a processor 504 and storage 506. While a vehicle system 500 is shown in FIG. 5, the example components as illustrated are not intended to be limiting. Indeed, the vehicle system 500 may have more or fewer components, and additional or alternative components and/or implementations may be used. It should be noted that the use of a vehicle system 500 environment is illustrative, as the components and/or functionality may be utilized in other types of systems such as flight control system in an airplane, or a medical device or industrial machine.

The vehicle system 500 may include various types of automobile, crossover utility vehicle (CUV), sport utility vehicle (SUV), truck, recreational vehicle (RV), boat, plane or other mobile machine for transporting people or goods. In many cases, the vehicle system 500 may be powered by an internal combustion engine. As another possibility, the vehicle system 500 may be a hybrid electric vehicle (HEV) powered by both an internal combustion engine and one or more electric motors, such as a series hybrid electric vehicle (SHEV), a parallel hybrid electrical vehicle (PHEV), or a parallel/series hybrid electric vehicle (PSHEV). As the type and configuration of the vehicle system 500 may vary, the capabilities of the vehicle system may correspondingly vary. As some other possibilities, vehicle system 500 may have different capabilities with respect to passenger capacity, towing ability and capacity, and storage volume.

The computing system 502 may include a Human Machine Interface (HMI) 512 and a display 528 for user interaction with the computing system 502. An example computing system 502 may be the SYNC™ system provided by FORD MOTOR COMPANY™ of Dearborn, Mich. In some examples the display 528 may include a vehicle infotainment system including one or more displays. The HMI 512 may be configured to support voice command and BLUETOOTH™ interfaces with the driver and driver carry-on devices, receive user input via various buttons or other controls, and provide vehicle status information to a driver or other vehicle system 500 occupants. For instance, the computing system 502 may interface with one or more buttons or other HMI 512 configured to invoke functions on the computing system 502 (e.g., steering wheel audio buttons, a push-to-talk button, instrument panel controls, etc.). The computing system 502 may also drive or otherwise communicate with the display 528 configured to provide visual output to vehicle occupants, e.g., by way of a video controller. In some cases, the display 528 may be a touch screen further configured to receive user touch input via the video controller, while in other cases the display 528 may be a display only, without touch input capabilities. In an example, the display 528 may be a head unit display included in a center console area of the vehicle system 500. In another example, the display 528 may be a screen of a gauge cluster of the vehicle system 500.

The computing system 502 may further include various types of computing apparatus in support of performance of the functions of the computing system 502 described herein. In an example, the computing system 502 may include one or more processors 504 configured to execute computer instructions, and a storage 506 medium on which computer-executable instructions and/or data may be maintained. A computer-readable medium (also referred to as a processor-readable medium or storage 506) includes any non-transitory (e.g., tangible) medium that participates in providing data (e.g., instructions) that may be read by a computer (e.g., by the one or more processors 504). In general, the processor 504 receives instructions and/or data, e.g., from the storage 506, etc., to a memory and executes the instructions using the data, thereby performing one or more processes, including one or more of the processes described herein. Computer-executable instructions may be compiled or interpreted from computer programs created using a variety of programming languages and/or technologies, including, without limitation, and either alone or in combination, Java, C, C++, C#, Fortran, Pascal, Visual Basic, Python, Java Script, Perl, PL/SQL, etc. The storage 506 may include divisions for data 508 and applications 510. The data 508 may store information such as databases and other such information. The applications 510 may store the computer-executable instructions or other such instructions executable by the processor 504.

The computing system 502 may be configured to communicate with mobile devices of the vehicle system 500 occupants. The mobile devices may be any of various types of portable computing device, such as cellular phones, tablet computers, smart watches, laptop computers, portable music players, or other devices capable of communication with the computing system 502. As with the computing system 502, the mobile device may include one or more processors configured to execute computer instructions, and a storage medium on which the computer-executable instructions and/or data may be maintained. In some examples, the computing system 502 may include a wireless transceiver (e.g., a BLUETOOTH™ controller, a ZIGBEE™ transceiver, a Wi-Fi transceiver, etc.) configured to communicate with a compatible wireless transceiver of the mobile device. Additionally, or alternately, the computing system 502 may communicate with the mobile device over a wired connection, such as via a USB connection between the mobile device and a Universal Serial Bus (USB) subsystem of the computing system 502.

The computing system 502 may be further configured to communicate with other components of the vehicle system 500 via one or more in-vehicle networks 514. The in-vehicle networks 514 may include one or more of a vehicle controller area network (CAN), an Ethernet network, or a media oriented system transfer (MOST), as some examples. The in-vehicle networks 514 may allow the computing system 502 to communicate with other units of the vehicle system 500, such as ECU A 520, ECU B 522, ECU C 524, and ECU D 526. The ECUs 520, 522, 524, and 526 may include various electrical or electromechanical systems of the vehicle system 500 or control various subsystems of the vehicle system 500. Some non-limiting examples of ECUs include a powertrain control module configured to provide control of engine operating components (e.g., idle control components, fuel delivery components, emissions control components, etc.) and monitoring of engine operating components (e.g., status of engine diagnostic codes); a body control module configured to manage various power control functions such as exterior lighting, interior lighting, keyless entry, remote start, and point of access status verification (e.g., closure status of the hood, doors and/or trunk of the vehicle system 500); a radio transceiver module configured to communicate with key fobs or other vehicle system 500 devices, a climate control management module configured to provide control and monitoring of heating and cooling system components (e.g., compressor clutch and blower fan control, temperature sensor information, etc.) as well as a transmission control module, a brake control module, a central timing module, a suspension control module, a vehicle modem (which may not be present in some configurations), a global positioning system (GPS) module configured to provide vehicle system 500 location and heading information, and various other vehicle ECUs configured to corporate with the computing system 502. The subsystems controlled by the various ECUs may include functional components 516 of the vehicle system 500 including elements such as the powertrain, engine, brakes, lights, steering components, and the like. Additionally, some or all of the functional components 516 may include sensors 518 as well as additional sensors equipped to the vehicle system 500 for detecting various states, positions, proximity, temperature, and the like of the vehicle system 500 and subsystems thereof. The ECUs 520, 522, 524, 526 may communicate with the computing system 502 as well as the functional components 516 and the sensors 518 over the in-vehicle network 514. While only four ECUs are depicted in FIG. 5, any number (more or fewer) of ECUs may be included in vehicle system 500.

FIG. 6 illustrates a block diagram of an example of a computing device 600. Computing device 600 can be any of the described computers herein including, for example, computing system 502 within the vehicle system 500 of FIG. 5 as well as ECUs 520, 522, 524, 526. The computing device 600 can be or include, for example, an integrated computer, a laptop computer, desktop computer, tablet, server, or other electronic device.

The computing device 600 can include a processor 640 interfaced with other hardware via a bus 605. A memory 610, which can include any suitable tangible (and non-transitory) computer readable medium, such as RAM, ROM, EEPROM, or the like, can embody program components (e.g., program code 615) that configure operation of the computing device 600. Memory 610 can store the program code 615, program data 617, or both. In some examples, the computing device 600 can include input/output (“I/O”) interface components 625 (e.g., for interfacing with a display 645, keyboard, mouse, and the like) and additional storage 630.

The computing device 600 executes program code 615 that configures the processor 640 to perform one or more of the operations described herein. Examples of the program code 615 include, in various embodiments logic flowchart described with respect to FIG. 1 above. The program code 615 may be resident in the memory 610 or any suitable computer-readable medium and may be executed by the processor 640 or any other suitable processor.

The computing device 600 may generate or receive program data 617 by virtue of executing the program code 615. For example, sensor data, trip counter, authenticated messages, trip flags, and other data described herein are all examples of program data 617 that may be used by the computing device 600 during execution of the program code 615.

The computing device 600 can include network components 620. Network components 620 can represent one or more of any components that facilitate a network connection. In some examples, the network components 620 can facilitate a wireless connection and include wireless interfaces such as IEEE 802.11, BLUETOOTH™, or radio interfaces for accessing cellular telephone networks (e.g., a transceiver/antenna for accessing CDMA, GSM, UMTS, or other mobile communications network). In other examples, the network components 620 can be wired and can include interfaces such as Ethernet, USB, or IEEE 1394.

Although FIG. 6 depicts a computing device 600 with a processor 640, the system can include any number of computing devices 600 and any number of processor 640. For example, multiple computing devices 600 or multiple processor 640 can be distributed over a wired or wireless network (e.g., a Wide Area Network, Local Area Network, or the Internet). The multiple computing devices 600 or multiple processor 640 can perform any of the steps of the present disclosure individually or in coordination with one another.

In some embodiments, the functionality provided by the computing device 700 may be offered as cloud services by a cloud service provider. For example, FIG. 7 depicts an example of a cloud computing system 700 offering an intelligence service that can be used by a number of user subscribers using user devices 725 a, 725 b, and 725 c across a data network 720. User devices 725 a, 725 b, and 725 c could be examples of a vehicle system 500 described above. In the example, the intelligence service may be offered under a Software as a Service (SaaS) model. One or more users may subscribe to the intelligence service, and the cloud computing system performs the processing to provide the intelligence service to subscribers. The cloud computing system may include one or more remote server computers 705.

The remote server computers 705 include any suitable non-transitory computer-readable medium for storing program code (e.g., server 730) and program data 710, or both, which is used by the cloud computing system 700 for providing the cloud services. A computer-readable medium can include any electronic, optical, magnetic, or other storage device capable of providing a processor with computer-readable instructions or other program code. Non-limiting examples of a computer-readable medium include a magnetic disk, a memory chip, a ROM, a RAM, an ASIC, optical storage, magnetic tape or other magnetic storage, or any other medium from which a processing device can read instructions. The instructions may include processor-specific instructions generated by a compiler or an interpreter from code written in any suitable computer-programming language, including, for example, C, C++, C#, Visual Basic, Java, Python, Perl, JavaScript, and ActionScript. In various examples, the server computers 705 can include volatile memory, non-volatile memory, or a combination thereof.

One or more of the server computers 705 execute the program data 710 that configures one or more processors of the server computers 705 to perform one or more of the operations that determine locations for interactive elements and operate the adaptive rule-based system. As depicted in the embodiment in FIG. 7, the one or more server computers 705 provide the services to perform the adaptive rule-based system via the server 730. Any other suitable systems or subsystems that perform one or more operations described herein (e.g., one or more development systems for configuring an interactive user interface) can also be implemented by the cloud computing system 700.

In certain embodiments, the cloud computing system 700 may implement the services by executing program code and/or using program data 710, which may be resident in a memory device of the server computers 705 or any suitable computer-readable medium and may be executed by the processors of the server computers 705 or any other suitable processor.

In some embodiments, the program data 710 includes one or more datasets and models described herein. Examples of these datasets include dealership data, classification data, etc. In some embodiments, one or more of data sets, models, and functions are stored in the same memory device. In additional or alternative embodiments, one or more of the programs, data sets, models, and functions described herein are stored in different memory devices accessible via the data network 720.

The cloud computing system 700 also includes a network interface device 715 that enable communications to and from cloud computing system 700. In certain embodiments, the network interface device 715 includes any device or group of devices suitable for establishing a wired or wireless data connection to the data networks 720. Non-limiting examples of the network interface device 715 include an Ethernet network adapter, a modem, and/or the like. The server 730 is able to communicate with the user devices 725 a, 725 b, and 725 c via the data network 720 using the network interface device 715.

While the present subject matter has been described in detail with respect to specific aspects thereof, it will be appreciated that those skilled in the art, upon attaining an understanding of the foregoing, may readily produce alterations to, variations of, and equivalents to such aspects. Numerous specific details are set forth herein to provide a thorough understanding of the claimed subject matter. However, those skilled in the art will understand that the claimed subject matter may be practiced without these specific details. In other instances, methods, apparatuses, or systems that would be known by one of ordinary skill have not been described in detail so as not to obscure claimed subject matter. Accordingly, the present disclosure has been presented for purposes of example rather than limitation, and does not preclude the inclusion of such modifications, variations, and/or additions to the present subject matter as would be readily apparent to one of ordinary skill in the art

Unless specifically stated otherwise, it is appreciated that throughout this specification discussions utilizing terms such as “processing,” “computing,” “calculating,” “determining,” and “identifying” or the like refer to actions or processes of a computing device, such as one or more computers or a similar electronic computing device or devices, that manipulate or transform data represented as physical electronic or magnetic quantities within memories, registers, or other information storage devices, transmission devices, or display devices of the computing platform. The use of “adapted to” or “configured to” herein is meant as open and inclusive language that does not foreclose devices adapted to or configured to perform additional tasks or steps. Additionally, the use of “based on” is meant to be open and inclusive, in that a process, step, calculation, or other action “based on” one or more recited conditions or values may, in practice, be based on additional conditions or values beyond those recited. Headings, lists, and numbering included herein are for ease of explanation only and are not meant to be limiting.

Aspects of the methods disclosed herein may be performed in the operation of such computing devices. The system or systems discussed herein are not limited to any particular hardware architecture or configuration. A computing device can include any suitable arrangement of components that provide a result conditioned on one or more inputs. Suitable computing devices include multi-purpose microprocessor-based computer systems accessing stored software that programs or configures the computing system from a general purpose computing apparatus to a specialized computing apparatus implementing one or more aspects of the present subject matter. Any suitable programming, scripting, or other type of language or combinations of languages may be used to implement the teachings contained herein in software to be used in programming or configuring a computing device. The order of the blocks presented in the examples above can be varied—for example, blocks can be re-ordered, combined, and/or broken into sub-blocks. Certain blocks or processes can be performed in parallel. 

What is claimed is:
 1. A method of incrementally training neural networks comprising: defining, with a computing system, a probability distribution over a labeled sample pool, the labeled sample pool comprising a plurality of labeled samples; generating, with the computing system, a first training set by selecting a first set of labeled samples from the labeled sample pool based on the probability distribution, the first training set having a training set size; training, with the computing system, a neural network with the first training set; adding, with the computing system, one or more additional labeled samples to the labeled sample pool; after the adding, generating, with the computing system, a second training set by selecting a second set of labeled samples from the labeled sample pool based on the probability distribution, the second training set having the training set size; and training, with the computing system, the neural network with the second training set.
 2. The method of claim 1, further comprising: continuously repeating, with the computing system, until a determination to cease continuously repeating, the steps of: adding the one or more additional labeled samples to the labeled sample pool, after the adding, generating another training set by selecting another set of labeled samples from the labeled sample pool based on the probability distribution, the another training set having the training set size; and training the neural network with the another training set.
 3. The method of claim 1, wherein the probability distribution is defined such that the probability of selecting any given labeled sample from the labeled sample pool is inversely proportional to a number of times the any given labeled sample has been previously selected.
 4. The method of claim 1, wherein the probability distribution is a weighted random distribution.
 5. The method of claim 1, wherein the probability distribution comprises a hyperparameter controlling a sharpness of the probability distribution.
 6. The method of claim 5, wherein the hyperparameter is defined in the probability distribution to a value that avoids a uniform distribution over all samples and avoids a one-hot distribution.
 7. The method of claim 1 wherein the defined probability distribution is modified by a fraction of the labeled sample pool, the fraction of the labeled sample pool having a common dataset label.
 8. The method of claim 1 wherein the training set size is a constant.
 9. The method of claim 1 wherein the training set size is a function of a size of the labeled sample pool.
 10. The method of claim 1 wherein the training set size is a function of a size of a training sample pool.
 11. A system for training neural networks comprising: a neural network; a labeled sample pool comprising: at least one training sample; a plurality of labeled samples; a processor; and a system memory, the system memory storing instructions to cause the one or more processors to: define a probability distribution over the labeled sample pool, the labeled sample pool comprising a plurality of labeled samples; generate a first training set by selecting a first set of labeled samples from the labeled sample pool based on the probability distribution, the first training set having a training set size; train the neural network with the first training set; add one or more additional labeled samples to the labeled sample pool; generate a second training set by selecting a second set of labeled samples from the labeled sample pool based on the probability distribution, the second training set having the training set size; and train the neural network with the second training set.
 12. The system of claim 11, wherein the system memory further stores instructions instructing the one or more processors to: continuously repeat until a determination to cease continuously repeating, the steps of: add the one or more additional labeled samples to the labeled sample pool, generate another training set by selecting another set of labeled samples from the labeled sample pool based on the probability distribution, the another training set having the training set size; and train the neural network with the another training set.
 13. The system of claim 11, wherein the probability distribution is defined such that the probability of selecting any given labeled sample from the labeled sample pool is inversely proportional to a number of times the any given labeled sample has been previously selected.
 14. The system of claim 11, wherein the probability distribution is a weighted random distribution.
 15. The system of claim 11, wherein the probability distribution comprises a hyperparameter controlling a sharpness of the probability distribution.
 16. The system of claim 15, wherein the hyperparameter is defined in the probability distribution to a value that avoids a uniform distribution over all samples and avoids a one-hot distribution.
 17. The system of claim 11, wherein the defined probability distribution is modified by a fraction of the labeled example pool, the fraction of the labeled example pool having a common dataset label.
 18. A non-transitory, computer-readable medium comprising instructions that, when executed by a processor, causes the processor to: add at least one training example to a labeled example pool, the labeled pool having labeled training examples; sample from the labeled example pool to build a training set, the training set having a training set size; wherein the sampling is based on a defined probability distribution over the labeled example pool, the defined probability distribution having an inverse relation to the number of times the at least one training example has passed through the model and the defined probability distribution having a variable sharpness of distribution.
 19. The non-transitory, computer-readable medium of claim 18 wherein the probability distribution is defined such that the probability of selecting any given labeled sample from the labeled sample pool is inversely proportional to a number of times the any given labeled sample has been previously selected. 